ATOM Documentation

← Back to App

# BYOK Key Fix Deployment Summary

**Date:** 2026-05-01

**Deployment:** atom-saas version with pre-flight API key validation

**Image:** registry.fly.io/atom-saas:deployment-01KQHXM3XJ9JY595H6NG0GXADJ

## Problem Solved

Backfill jobs were failing during LLM extraction because:

1. **Wrong key being used**: System was falling back to global DeepSeek key ending in 6100 instead of tenant's key ending in 4207

2. **tenant_id not resolved**: LLMService was initialized with workspace_id as tenant_id, so BYOK lookup failed

3. **Silent failures**: Auth errors were caught and silently skipped, making debugging difficult

## Commits Deployed

### 1. 6303c13f57 - Resolve tenant_id from workspace_id

**File:** backend-saas/core/graphrag_engine.py

# Before: Used workspace_id as tenant_id (WRONG)
llm = LLMService(db=self.db, workspace_id=workspace_id, tenant_id=workspace_id)

# After: Resolve tenant_id from Workspace table (CORRECT)
w = session.query(Workspace).filter(Workspace.id == workspace_id).first()
resolved_tenant_id = str(w.tenant_id) if w and w.tenant_id else workspace_id
llm = LLMService(db=self.db, workspace_id=workspace_id, tenant_id=resolved_tenant_id)

**Impact:** BYOK key lookup now uses correct tenant_id

### 2. 077bbdc115 - Remove dangerous global key fallback

**File:** backend-saas/core/byok_endpoints.py

# REMOVED: Dangerous fallback to global keys
fallback_key_id = f"{provider_id}_{key_name}_{environment}"  # e.g., "deepseek_default_production"
if fallback_key_id in self.api_keys:
    return decrypt(self.api_keys[fallback_key_id])  # ← REMOVED THIS LINE

**Impact:** No longer returns global keys when tenant key lookup fails

### 3. a51e194459 - Raise AuthenticationError on auth failure

**File:** Multiple files in BYOKHandler

# Before: Silently skip on auth failure
except Exception as e:
    logger.warning(f"Provider {provider_id} failed: {e}")
    continue  # Try next provider

# After: Raise clear error
raise AuthenticationError(
    f"Failed to authenticate with {provider_id}: {auth_error}"
)

**Impact:** Clear error messages when API keys are invalid

### 4. ab01dc1cb5 - Pre-flight API key check

**File:** backend-saas/core/historical_sync_service.py

# Check BEFORE fetching records (lines 624-648)
openai_key = db.query(TenantSetting).filter(
    TenantSetting.tenant_id == tenant_id,
    TenantSetting.setting_key == "OPENAI_API_KEY"
).first()

has_openai_key = (
    openai_key
    and openai_key.setting_value
    and not openai_key.setting_value.startswith("mock")
)

can_use_graphrag = has_graphrag_access and has_openai_key

if not can_use_graphrag:
    logger.warning(
        f"Skipping backfill: Tenant {tenant_id} has no valid API key. "
        f"Add key in Settings or skip GraphRAG extraction."
    )
    return  # Stop immediately, don't fetch records

**Impact:**

- **Faster feedback:** Job stops immediately if no API key (vs fetching all records then failing)

- **Clearer logs:** Explicit warning about missing API key

- **No wasted resources:** Doesn't fetch emails that can't be processed

## Expected Behavior

### Scenario 1: Tenant with Valid API Key (Brennan)

- ✅ Checks tenant_settings for OPENAI_API_KEY

- ✅ Finds key ending in 4207

- ✅ Resolves tenant_id from workspace_id

- ✅ Uses tenant's key (not global fallback)

- ✅ LLM extraction succeeds

- ✅ Entities and relationships extracted

### Scenario 2: Tenant Without API Key

- ✅ Checks tenant_settings for OPENAI_API_KEY

- ✅ Key not found or invalid

- ✅ Logs clear warning: "Skipping backfill: Tenant has no valid API key"

- ✅ Stops immediately (doesn't fetch records)

- ✅ Job marked as failed with clear reason

### Scenario 3: Tenant with Invalid API Key

- ✅ Checks tenant_settings for OPENAI_API_KEY

- ✅ Key found but invalid (401 error)

- ✅ Raises AuthenticationError with clear message

- ✅ Job marked as failed with auth error details

## Database State

**Brennan's tenant (verified):**

-- Tenant
SELECT id, subdomain, plan_type FROM tenants WHERE subdomain = 'brennan';
-- Result: 31c06fc4-db22-4740-83ea-48ac14f25810 | brennan | team

-- Workspace
SELECT id, tenant_id FROM workspaces WHERE tenant_id = '31c06fc4-db22-4740-83ea-48ac14f25810';
-- Result: 795c2ec9-b794-47ea-9aae-12c1c3d48589 | 31c06fc4-db22-4740-83ea-48ac14f25810

-- API Keys
SELECT setting_key, LENGTH(setting_value), RIGHT(setting_value, 8)
FROM tenant_settings
WHERE tenant_id = '31c06fc4-db22-4740-83ea-48ac14f25810'
  AND setting_key LIKE '%API_KEY%';
-- Results:
--   DEEPSEEK_API_KEY   | 35 | ...4f474207  ✅ CORRECT KEY
--   OPENAI_API_KEY     | 164| ...CQnbyPMA  ✅ CORRECT KEY
--   GOOGLE_API_KEY     | 39 | ...LLfokymk
--   MINIMAX_2_7_API_KEY| 126| ...7onMexq8

## Testing

To verify the fix:

1. **Trigger a backfill** for brennan tenant

2. **Check logs** for: "Resolved tenant_id from workspace_id"

3. **Check logs** for: "Using tenant's DEEPSEEK_API_KEY"

4. **Verify** LLM extraction succeeds

5. **Verify** entities and relationships are created

To test the pre-flight check:

1. **Remove** the OPENAI_API_KEY from tenant_settings temporarily

2. **Trigger** a backfill

3. **Verify** job fails immediately with: "Skipping backfill: Tenant has no valid API key"

4. **Verify** NO records are fetched (faster feedback)

5. **Restore** the API key

## Future Work

For **managed AI tenants** (platform provides keys):

- LLMService will handle key management transparently

- No code change needed

- System will use platform keys instead of BYOK keys

The current fix ensures BYOK tenants' keys are found correctly without falling back to global keys.